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Abstract 

We apply a multiphase strategy for pedigree-based genetic analysis of systolic blood pressure data collected in a 
longitudinal study of large Mexican American pedigrees. In the first phase, we conduct variance-components 
linkage analysis to identify regions that may harbor quantitative trait loci. In the second phase, we carry out 
pedigree-based association analysis in a selected region with common and low-frequency variants from genome- 
wide association studies and whole genome sequencing data. Using sequencing data, we compare approaches to 
pedigree analysis in a 10 megabase candidate region on chromosome 3 harboring a gene previously identified by 
a consortium for blood pressure genome-wide association studies. We observe that, as expected, the measured 
genotype analysis tends to provide larger signals than the quantitative transmission disequilibrium test. We also 
observe that while linkage signals are contributed by common variants, strong associations are found mainly at 
rare variants. Multiphase analysis can improve computational efficiency and reduce the multiple testing burden. 



Background 

In pedigree-based studies, discovery of genomic regions 
harboring genetic determinants of quantitative traits such 
as systoUc blood pressure (SBP) has conventionally been 
conducted using linkage analysis based on identity-by- 
descent allele sharing. In the genome-wide association 
studies (GWAS) era of cost-effective high-throughput 
genotyping technology, the mapping of the genetic basis 
of complex traits/diseases in human populations has 
been population-based in unrelated individuals, and lar- 
gely case-control or cross-sectional in design. With the 
advent of next-generation sequencing technology, inves- 
tigators are able to examine each single base pair (bp) 
and test for association with a trait, but the massive 
amount of variant information available for analysis can 
be overwhelming. With the development of techniques 
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for pedigree-based imputation from sequence data on 
selected pedigree members, pedigree-based analysis of 
whole genome sequencing data is feasible. 

We demonstrate that multiphase analysis in pedigrees 
can be an efficient strategy for identifying genetic variants 
underlying a quantitative trait, in which region discovery 
by linkage analysis of GWAS single-nucleotide poly- 
morphism (SNP) markers with high minor allele fre- 
quency (MAF) is followed by region refinement with 
densely distributed GWAS SNPs and/or fine mapping 
with sequence variants in identified regions. Using a 
summary phenotype derived from longitudinal measure- 
ments of SBP together with GWAS and whole genome 
sequencing genotype data from the San Antonio Family 
Studies (SAFS) as provided by Genetic Analysis Work- 
shop 18 (GAW18), we report pedigree-based linkage and 
association analysis conducted to identify genetic variants 
underlying SBP. Our multiphase analyses are carried out 
in 3 steps, as illustrated by the workflow in Figure 1. 
First, we obtain a summary phenotype for each individual 
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Figure 1 Workflow for the multiphase linl<age and association analysis of a complex pedigree study with GWAS SNP and whole 
genome sequence data 



using the residuals from a censored normal regression 
model with a random intercept for each pedigree, where 
the censoring indicator is antihypertensive medication. In 
the second step, we conduct linkage analysis on chromo- 
some 3 with a sample of GWAS SNP markers (MAF > 
5%). We detect linkage at a locus in a region harboring a 
candidate SNP, rs419076 (bp: 169100886, near MECOM, 
3q26) identified in a pathway influencing blood pressure 
and cardiovascular disease risk by the International Con- 
sortium for Blood Pressure Genome- Wide Association 
Studies (ICBP-GWAS) [1]. In step 3, we conduct pedi- 
gree-based association analysis using sequence data to 
fine-map the MECOM genomic region. 

Methods 

SAFS pedigree data 

From a total of 1389 participants in 20 pedigrees, 932 
have SEP measurements at 1 or more study exams for up 
to 4 exams. Characteristics recorded include sex, year of 
exam, age at each exam, current use of antihypertensive 
medications, and current tobacco smoking. GWAS geno- 
types were assayed in a total of 959 individuals, with a 
total of 65,519 GWAS SNPs on chromosome 3 available 
for analysis. Among these individuals, 464 were also 
sequenced at an average 60 x coverage, resulting in 
1,215,399 sequence variants on chromosome 3. For the 
remaining 495 individuals, the missing genotypes at the 
sequence variants were imputed using a novel popula- 
tion-based imputation approach [2]. Because the pro- 
gram SOLAR required genotype data, in the focused 
association analysis following the linkage scan, we used 
the imputed "best guess" sequence genotypes. Subse- 
quent analyses ignored imputation uncertainty. 

Phenotype adjustment 

Antihypertensive medication complicates the analysis of 
SEP, because patients prescribed medication tend to have 
elevated underlying SEP values. Eased on a novel extension 
developed by Konigorski et al [3] , we treated medication as 
a right-censoring indicator such that the unmodified SEP 
for an individual under medication is higher than the 
observed, and fit a censored normal regression model to 
the observed SEP measurements for each exam assuming 
noninformative censoring. In addition, we took into 
account the between-pedigree variation by incorporating a 



pedigree-specific random component. Analyzing each of 
the first 3 visits separately, we included sex, exam-specific 
age, and smoking status as covariates. Let Y he the 
observed SEP and y be the fitted SEP from the censored 
model given exam-specific covariates and pedigree-specific 
random effects. For an individual receiving medication, let 
Y* be the conditional expectation of the underlying SEP 
given exam-specific covariates and pedigree-specific ran- 
dom effects and assuming that the underlying unmodified 
SEP is greater than the observed value, for details see Koni- 
gorski et al [3] . We computed residuals at each exam by 
y _ y if an individual was not under medication, and by 
Y* — Y otherwise. The mean of the residuals at exams 1 to 
3, denoted by R, was then used as an adjusted phenotype 
for each individual in subsequent stages of linkage and 
association analysis. 

Variance component linkage analysis 

To detect regions with potential loci for SEP, we applied 
the variance-component linkage method for pedigree- 
based analysis [4]. In an additive polygenic model, the 
overall phenotypic covariance matrix E for a pedigree of 
n members is partitioned into a locus-specific variance 
component (ffq^), an additive genetic variance attributa- 
ble to an unspecified number of remaining loci at 
unknown locations in the genome (cr^ ), and an environ- 
mental variance component {cr^y Specifically, the phe- 
notypic covariance matrix has the form 

E = na^i + 2cD(t2 + i^or^, 

where the elements of the structuring matrix for the 
locus-specific variance, 11, are proportions representing 
the identity-by-descent (lED) sharing of alleles for each 
relative pair at this locus; the structuring matrix for the 
additive genetic variance component, 20, is twice the 
kinship coefficient matrix; and the matrix for the var- 
iance resulting from unshared environmental effects is 
specified by the identity matrbc I„. To examine the influ- 
ence of GWAS SNP density on linkage analysis, we 
sampled 3 sets of SNPs. Initially, a total of 988 SNP 
markers was randomly sampled from chromosome 3 
GWAS SNPs with MAF >5%. To allay concerns about 
adequacy of SNP density, in the second and third sam- 
plings, we randomly sampled 1620 and 2999 SNPs, 
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respectively, excluding previously sampled SNPs and 
using the same MAF criteria. We first performed quan- 
titative genetic analysis to create a suitable null model 
for each selected marker [4]. Applying the genetic analy- 
sis software SOLAR to the sampled GWAS data, we 
estimated IBD allele sharing for all pairs of relatives in 
each pedigree, using single-marker estimation to ease 
computation in the very complex pedigrees. We also 
performed 2-point rather than multipoint linkage analy- 
sis and computed the log of odds (LOD) score for each 
marker. Regions with LOD >1.2 were considered inter- 
esting for subsequent fine mapping analyses. For 
demonstration purposes, in this paper we focused fine- 
mapping analyses on the candidate region 165 to 175 
megabases (Mb) on chromosome 3. 

Family-based association analysis 

In a candidate region on chromosome 3 identified with 
some evidence for linkage in the sampled GWAS data 
and previously reported in GWAS meta-analysis [1], we 
compared the linkage signals to the association analyses 
implemented in SOLAR: measured genotype (MG) ana- 
lysis and the quantitative transmission disequilibrium 
test (QTDT) [5], in which the phenotype, R, is modeled 
as a linear combination of fixed effects (ie, genotype 
scores) and random effects (ie, polygenic and linkage 
components). The genotype scores are decomposed into 
between-family (b) and within-family (w) components, 
resulting in fixed-effect model £ (J?) = /x + fiyb + PwW- 
The MG approach estimates regression coefficients with 
the constraint ji^, = ji^. The QTDT approach estimates 
both and and tests whether the within-family 
parameter is significantly different from 0. QTDT 
reflects the correlation between SNP genotype and phe- 
notype within families and is robust to population strati- 
fication effects [5], which can be a concern for MG, but 
QTDT is less powerful than MG. We computed the 
IBD allele sharing among pedigree members at each 
sequence variant in the candidate region, and then per- 
formed association tests simultaneously modeling link- 
age as a variance component based on the IBD sharing 
estimates. When linkage is present, including the linkage 
component in the association analysis helps control type 
I error [6]. 

Results and discussion 

Linkage scan 

With the first set of 988 GWAS SNPs, evidence for link- 
age with SBP on chromosome 3 using combined pedi- 
gree data was mainly found in 4 regions: 5 to 12 Mb, 47 
to 59 Mb, 89 to 115 Mb, and 165 to 175 Mb (Figure 2), 
with a chromosome-wide maximum LOD score of 1.41. 
These regions harbor SNP associations identified in a 
study undertaken by the ICBP-GWAS [1]. In conducting 



sensitivity analysis using 2 additional sets of randomly 
sampled GWAS SNPs, we observed multiple linkage 
peaks in similar regions. The maximum LOD scores for 
the second and third linkage analyses were 1.50 and 
1.63. Although differences in the maximum LOD score 
among the 3 analyses were not substantial (ie, around 
0.23), the maximum LODs did not always correspond to 
the same region (Table 1). We obtained the names of 
genes nearest these locations using the annotation 
report from Nalpathamkalam et al [7]. 

Association 

Based on our linkage results and prior report by ICBP- 
GWAS [1], we fine-mapped the 10-Mb chromosomal 
region (165 to 175 Mb) surrounding the SNP rs419076 in 
the gene MECOM (3q26). Among the 58,651 variants in 
this region, 20,211 are common (MAF >5%), 10,508 are 
low-frequency (1% to 5% MAF), and 27,932 are rare 
(MAF <1%). We observed that, as expected, the MG 
association analysis tended to provide larger signals than 
the QTDT approach (Figure 3). To assess for global infla- 
tion of type I error in the MG and QTDT approaches, we 
conducted association analysis using the 2999 sample 3 
GWAS SNPs. No inflation of type I error was observed 
in the Q-Q plots for MG, either with or without a linkage 
variance component. However, the observed type I error 
rate from the QTDT approach appeared to be slightly 
deflated, particularly when linkage was included as a var- 
iance component (data not shown). This suggests lack of 
population stratification and is consistent with theory 
that says the QTDT approach is less powerful than MG 
for detecting association. Comparing linkage and associa- 
tion results across the 3 variant MAF categories, we 
observed that linkage signals were contributed by com- 
mon variants (Figure 3 and Table 2, with the max LOD 
score observed at bp position 166324439). However, 
stronger associations were mainly found at rare variants, 
suggesting the linkage peak may correspond to a haplo- 
type block harboring rare variants underlying blood pres- 
sure. The strongest signal was observed at bp position 
172046675 with a MG p value of 1.55 x 10"^ (Table 2). 
Because the analysis was conducted in a candidate region 
partially selected by independent prior data, we did not 
require genome-wide significant association, but appro- 
priate criteria in this setting is an open question. 

Conclusions 

The main purpose of the proposed multiphase design is 
to first identify interesting genomic regions for a com- 
plex quantitative trait, and then to fine-map those 
regions in follow-up studies, reducing both the number 
of tests for association conducted at null variants and 
the computational processing time. With randomly 
sampled common GWAS SNP data for large Mexican 
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Figure 2 Two-point linkage analysis for all pedigrees using 3 sets of randomly sampled GWAS SNPs. The blue, red, and green linkage 
profiles are for samples of 988, 1620, and 2999 GWAS SNPs, respectively. The beige and yellow regions correspond to 27.5 ± 5 Mb and 170.5 ± 
5 Mb, respectively, and the light gray and light blue horizontal lines denote LOD = 1.0 and LOD = 1.5, respectively. 



Table 1 Results of 2-point linkage analysis with LOD >1.20, ordered by position, using 3 sets of randomly sampled 
common GWAS SNPs (IVIAF >0.05) from chromosome 3. LOD scores in bold denote values > 1.35 (column 5). 



Marker Position (Mb) Position (cM) SNP set LOD (>1.2) Class Gene 



rs304094 


4.51744 


1 3.9988 


2 


1.24 


intergenic 


SUMFl 


rsl 7044432 


6.28766 


18.6921 


2 


1.25 


Intergenic 


GRM7 


rs2213215 


60.64305 


82.6846 


3 


1.43 


Intronic 


FHIT 


rs98 16856 


73.18624 


100.8683 


3 


1.22 


Intergenic 


PPP4R2 


rs6798130 


7410858 


1 02.7685 


2 


1.23 


Intergenic 


CNTN3 


rs7631179 


82.55674 


108.3900 


3 


1.29 


Intergenic 


GBEl 


rsl 3093396 


86.16787 


108.8700 


2 


1.44 


Intergenic 


CADM2 


rs9860570 


86.28946 


108.9715 


1 


1.24 


Intergenic 


CADM2 


rsl 598234 


94.0041 6 


110.3673 


1 


1.34 


Intergenic 


NSUN3 


rsl 171 9592 


100.25710 


111.9176 


2 


1.50 


Intronic 


TMEM45A 


rs4928048 


1 0027040 


1 1 1 .9277 


3 


1.20 


Intronic 


TMEM45A 


rs46 18204 


101.28150 


1127198 


3 


1.25 


Intronic 


TRMTIOC 


rsl 6844883 


1 08.49870 


118.6100 


2 


1.38 


Intergenic 


RETNLB 


rs323629 


151.92760 


161.0093 


3 


1.25 


Intergenic 


LOC401093 


rsl 53391 3 


152.70880 


162.0010 


3 


1.63 


Intergenic 


P2RY1 


rsl 0935963 


153.68720 


162.6455 


1 


1.41 


Intergenic 


ARHGEE26-AS1 


rsl 1916399 


166.34160 


168.2915 


1 


1.39 


Intergenic 


ZBBX, MECOM* 


rs6809553 


181.11490 


185.8237 


3 


1.30 


Intergenic 


DNAJC19 


cM, centimorgan. 














*The linkage signal 


is also close to the gene 


MECOM. 
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Figure 3 Linkage and association analysis for a 10-iVlb region surrounding the MECOM gene using combined pedigrees The top, 
middle, and bottom panels correspond to 2-point linkage, MG association analysis, and QTDT, respectively. Variants with MAF <0.01, 0.01 to 0.05, 
and >0.05 are indicated by red, green, and blue colors, respectively. 



American pedigrees from SAFS, we identified 4 linkage 
regions for SBP on chromosome 3. Especially for 2- 
point linkage, high-density SNP analysis is desirable. In 
linkage analysis in an identified region, we observed 
higher LOD scores using imputed sequence data com- 
pared to GWAS SNP data, particularly for common var- 
iants (Figure 3, top panel). In family-based association 
analysis of sequence variants, however, we observed 
stronger association signals at rare variants compared to 
common variants. As is typical in fine-mapping studies, 
we examined association with sequence variants under 



linkage peaks obtained from a chromosome-wide scan. 
Depending on the inherent power in a study, it may be 
advisable to establish a fairly liberal criterion for identifi- 
cation of linkage regions. Although the linkage strategy 
we used reduces the multiple testing burden in phase 2, 
it may miss regions of interest that would have been 
detected by a GWAS association analysis. For purposes 
of comparison, albeit in a single data set, we examined 
the results from a complete, dense GWAS scan of chro- 
mosome 3 that used mixed models to account for the 
pedigree structure [8]. We observed that both strategies 
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Table 2 Top 5 linkage signals and top 5 associations with 
SBP are indicated in bold in the 165- to 175-Mb region 
on chromosome 3 (ordered by position) 



Position 


MAF 


LOD score 


IVIG p value 


QTDT p value 


165794197 


0403 


1.54 


0.200 


0.611 


165803609 


0402 


1.54 


0.215 


0.543 


165804946 


0400 


1.62 


0.257 


0.522 


166324439 


0.118 


1.63 


0.755 


0.749 


166332595 


0.118 


1.57 


0.637 


0.695 


167201711 


0.0021 


0.59 


1 .04E-04 


2.66E-04 


167391612 


0.0037 


0.55 


2.94E-05 


1.06E-03 


1 72046675 


0.0026 


0.06 


1 .56E-07 


5.49E-05 


172516067 


0.0021 


0.59 


1 .58E-05 


4.37E-05 


175210951 


0.0010 


0.00 


6.27E-07 


1.03E-04 



identified regions near 150 Mb and 175 Mb using a 
linkage criterion of LOD >1.0 and a GWAS criterion of 
p < 10" ; the chromosome-wide maxima near 150 Mb 
agreed quite well. Our linkage scan also identified 
regions at other locations, including those near 10, 27, 
and 100 Mb, that would have required more liberal 
GWAS criteria for identification. 
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